Firefly Monte Carlo: Exact MCMC with Subsets of Data

نویسندگان

  • Dougal Maclaurin
  • Ryan P. Adams
چکیده

Markov chain Monte Carlo (MCMC) is a popular and successful general-purpose tool for Bayesian inference. However, MCMC cannot be practically applied to large data sets because of the prohibitive cost of evaluating every likelihood term at every iteration. Here we present Firefly Monte Carlo (FlyMC) an auxiliary variable MCMC algorithm that only queries the likelihoods of a potentially small subset of the data at each iteration yet simulates from the exact posterior distribution, in contrast to recent proposals that are approximate even in the asymptotic limit. FlyMC is compatible with a wide variety of modern MCMC algorithms, and only requires a lower bound on the per-datum likelihood factors. In experiments, we find that FlyMC generates samples from the posterior more than an order of magnitude faster than regular MCMC, opening up MCMC methods to larger datasets than were previously considered feasible.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Asymptotically Exact, Embarrassingly Parallel MCMC

Communication costs, resulting from synchronization requirements during learning, can greatly slow down many parallel machine learning algorithms. In this paper, we present a parallel Markov chain Monte Carlo (MCMC) algorithm in which subsets of data are processed independently, with very little communication. First, we arbitrarily partition data onto multiple machines. Then, on each machine, a...

متن کامل

Inference for Lévy Driven Stochastic Volatility Models Via Adaptive Sequential Monte Carlo

In the following paper we investigate simulation methodology for Bayesian inference in Lévy driven SV models. Typically, Bayesian inference from such statistical models is performed using Markov chain Monte Carlo (MCMC) methods. However, it is well-known that fitting SV models using MCMC is not always straight-forward. One method that can improve over MCMC is SMC samplers ([14]), but in that ap...

متن کامل

Using Markov chain Monte Carlo for multipoint linkage analysis: Improved estimates of lod scores

The calculation of exact likelihoods from pedigree data is limited to datasets containing either a small number of meioses, or a small number of linked genetic loci. In particular, the computation of likelihoods from data collected at multiple loci on large, extended pedigrees is infeasable. We perform multipoint linkage analysis on such datasets by estimating ratios of these otherwise intracta...

متن کامل

Parallel MCMC with generalized elliptical slice sampling

Probabilistic models are conceptually powerful tools for finding structure in data, but their practical effectiveness is often limited by our ability to perform inference in them. Exact inference is frequently intractable, so approximate inference is often performed using Markov chain Monte Carlo (MCMC). To achieve the best possible results from MCMC, we want to efficiently simulate many steps ...

متن کامل

Generalizing Elliptical Slice Sampling for Parallel MCMC

Probabilistic models are conceptually powerful tools for finding structure in data, but their practical effectiveness is often limited by our ability to perform inference in them. Exact inference is frequently intractable, so approximate inference is often performed using Markov chain Monte Carlo (MCMC). To achieve the best possible results from MCMC, we want to efficiently simulate many steps ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2014